home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
SGI MineSet 2.0.1
/
SGI MineSet 2.0.1.iso
/
docs6.3
/
relnotes
/
MineSet
/
ch4.z
/
ch4
Wrap
Text File
|
1998-01-06
|
19KB
|
529 lines
- 1 -
4. _C_h_a_n_g_e_s__a_n_d__A_d_d_i_t_i_o_n_s
4.1 _C_h_a_n_g_e_s__a_n_d__A_d_d_i_t_i_o_n_s__i_n__M_i_n_e_S_e_t__2_._0_._1
4.1.1 _G_e_n_e_r_a_l__C_h_a_n_g_e_s
+o The Associations Rule Generator now accepts input from
flat files as well as databases. The Tool Manager
interface for Associations has been changed to support
this and to simplify the process of working with
Associations. Use of "assoccvt" for the creation of
"assoc" binary files now occurs automatically and
invisibly, thus the buttons for creation and selection
of these binary files have been removed. N.B. If you
wish to run the Rule Visualizer without running
Associations, you can do so using the Tool Manager's
"Visual Tools" menu.
+o Client speed for reading MineSet binary files is
considerably faster than in version 2.0.
+o MineSet 2.0.1 complies with the X/Open guidelines for
dates past the year 2000. Previous versions of MineSet
had already used 4 digit year fields for ascii output,
and an internal date/time format which handles dates
well beyond 2000. The only change from previous
versions is that when MineSet reads from externally
prepared ascii files in which dates have 2 digit year
format, year fields are interpreted with 00-68 being
2000-20068 and 69-99 being 1969-1999.
4.2 _C_h_a_n_g_e_s__a_n_d__A_d_d_i_t_i_o_n_s__i_n__M_i_n_e_S_e_t__2_._0
4.2.1 _G_e_n_e_r_a_l__C_h_a_n_g_e_s
+o All the visual tools except for the Rule Visualizer and
Evidence Visualizer support multiple selection,
allowing selection of multiple objects in the scene.
The data associated with all selected objects may be
viewed by choosing _S_e_l_e_c_t_i_o_n_s/_S_h_o_w _V_a_l_u_e_s from the
tool's menu. For most visual tools, multiple selection
is accomplished using Shift-Left mouse click. (In the
Splat Visualizer it is accomplished by drawing a box
around the selections.)
+o All the visual tools except for the Rule Visualizer
support "Drill Through". This allows you to select one
or more objects, and send a request to the Tool Manager
to fetch the original data. There are two options.
_S_e_l_e_c_t_i_o_n_s/_S_h_o_w _O_r_i_g_i_n_a_l _D_a_t_a tells Tool Manager to
bring up a table of the original data that resulted in
- 2 -
the selections, while _S_e_l_e_c_t_i_o_n_s/_S_e_n_d _t_o _T_o_o_l _M_a_n_a_g_e_r
tells the Tool Manager to insert a filter operation,
allowing the user to launch other visualizations or
mining tools on the selected data.
+o A new tool, the Splat Visualizer (splatviz), aggregates
large amounts of data, and displays it using
transparent graphical objects (splats). Using this
tool one can interactively view data which has very
many records.
+o A Statistics Visualizer displays basic statistics of
the data, including mean, standard deviation,
quartiles, number of values, and histograms. The
Statistics Visualizer is built into the Tool Manager.
+o A Record Viewer replaces the Text Editor for viewing
MineSet data files. This displays the data in tabular
form.
+o MineSet data files now default to a more compact,
faster-to-read binary format. The ASCII format is
still supported and may be specified via the Tool
Manager _P_r_e_f_e_r_e_n_c_e_s panel.
+o The visual tools can save and print images of
themselves. (However, in Release 2.0/2.0.1, due to a
limitation in the implementation, this functionality is
only available when displaying on a Silicon Graphics
workstation. See the _K_n_o_w_n _P_r_o_b_l_e_m_s _a_n_d _W_o_r_k_a_r_o_u_n_d_s
section for more details.)
+o The visual tools' Animation Panel has three new buttons
below the VCR-line buttons which control the play mode:
Play-Once, Loop, and Swing. In the default Play-Once
mode, the animation follows the drawn path from
beginning to end (or end to beginning, for Play
Reverse) and stops. In Loop mode, the animation
follows the drawn path from beginning to end (or end to
beginning), then seamlessly and indefinitely repeats.
In Swing mode, the animation follows the drawn path
from beginning to end, then backward from the end to
the beginning, then again from beginning to end, ad
infinitum.
+o All configuration files now include a version number
"MineSet 2.0" as the first line.
+o A symbolic link was added so that /_u_s_r/_l_i_b/_m_i_n_e_s_e_t can
be used in place of /usr/lib/MineSet.
- 3 -
+o Several of the images have been moved from
_M_i_n_e_S_e_t__c_o_m_m_o_n to _M_i_n_e_S_e_t.
+o The utilities mineset2sas and sas2mineset have been
added for converting files between MineSet and SAS
format.
+o Setting the environment variable MINESET_WARN_EXECUTE
will have the same effect as launching all visual tools
with the -warnexecute option, and will cause the visual
tools to issue a warning before executing a user
specified command.
+o A -quiet option has been added to the visual tools. If
this option is specified, the tools will not pop up
dialogs when they are busy. This can be turned on
permanently by adding the line
*minesetQuiet:TRUE
to your .Xdefaults file.
4.2.2 _C_h_a_n_g_e_s__a_n_d__A_d_d_i_t_i_o_n_s__t_o__t_h_e__T_r_e_e__V_i_s_u_a_l_i_z_e_r
+o Because Shift Left mouse is now used for multiple
selection, you must use the Control key to indicate
that a zoom is not to take place.
+o When a bar is selected, the zooming will take place to
view the complete base on which the bar rests rather
than only the individual bar. Clicking on any bar on a
give base will zoom to the same location as clicking on
the base itself.
+o The Filter Panel now contains filtering criteria
similar to the Search Panel, but it filters out the
nodes that don't match rather than highlighting those
that do.
+o In the Main window, clicking Mouse button 3 can bring
up a menu to select the children of a node. If you
click on a node with children, it will give you a list
of the children of that node. If you do not click on a
node, but a node is selected, it will give you a list
of children of the selected node. If nothing is
selected, or if the selected node has no children, no
menu will be displayed.
+o New external Control buttons have been added to move to
the sibling to the left or right of the current
selection, to move to the first or last child of the
current selection, or to provide a list of children of
the current selection. These have also been added to
- 4 -
the Go menu except for the list of children.
+o The distinction between scale and max has been
eliminated in the configuration file. Scale is now the
recommended option, and can be used wherever max was
previously required. For compatibility, max can also
be used wherever scale can be used.
+o The execute statement can now be specified via the tool
options in the Tool Manager.
+o The Search Panel now has a _S_e_l_e_c_t button which will
select everything that matched the previous search.
4.2.3 _C_h_a_n_g_e_s__a_n_d__A_d_d_i_t_i_o_n_s__t_o__t_h_e__S_c_a_t_t_e_r__V_i_s_u_a_l_i_z_e_r
+o The Scatter Visualizer now supports an execute
statement similar to the Tree and Map Visualizers.
This can be specified in the Tool Manager or edited
directly into the configuration file.
+o The Filter Panel has been moved from the Filter menu to
the View Menu. _S_e_t _L_a_n_d_s_c_a_p_e _t_o _F_i_l_t_e_r has been
renamed _S_c_a_l_e _t_o _f_i_l_t_e_r, moved into the Filter Panel,
and defaults to on.
+o For users familiar with Inventor, it is possible to
turn on the Inventor menu by setting the X resource
*minesetInventorMenu:True
to your .Xdefaults file.
+o Spin animation can be enabled or disabled by setting
the X resource
Scatterviz*SoXtExaminerViewer.spinAnimation: on/off
to your .Xdefaults file.
4.2.4 _C_h_a_n_g_e_s__a_n_d__A_d_d_i_t_i_o_n_s__t_o__t_h_e__M_a_p__V_i_s_u_a_l_i_z_e_r
+o The execute statement, the "map outlines" geo hierarchy
file, and the "color normalize" statement can now be
specified via the tool options in the Tool Manager.
+o The _V_i_e_w menu now supports a Filter Panel.
+o The _S_e_l_e_c_t_i_o_n_s menu supports the customary options seen
in the other tools (_S_h_o_w _V_a_l_u_e_s, _S_h_o_w _O_r_i_g_i_n_a_l _D_a_t_a,
_S_e_n_d _T_o _T_o_o_l _M_a_n_a_g_e_r, and _C_o_m_p_l_e_m_e_n_t_a_r_y _D_r_i_l_l _T_h_r_o_u_g_h),
and in addition supports _S_e_l_e_c_t _A_l_l (all the objects in
the scene become selected).
- 5 -
4.2.5 _C_h_a_n_g_e_s__a_n_d__A_d_d_i_t_i_o_n_s__t_o__t_h_e__S_p_l_a_t__V_i_s_u_a_l_i_z_e_r
+o Spin animation can be enabled or disabled by setting
the X resource
Splatviz*SoXtExaminerViewer.spinAnimation: on/off
to your .Xdefaults file.
+o The textured splats are now MUCH faster.
4.2.6 _C_h_a_n_g_e_s__a_n_d__A_d_d_i_t_i_o_n_s__t_o__t_h_e__R_u_l_e__V_i_s_u_a_l_i_z_e_r
+o Its possible to customize the axes in ruleviz. A new
option has been added to the configuration file format.
The syntax is:
item labels <leftLabel> <rightLabel>;
For example:
item labels "LHS" "RHS";
produces the same as the default. The examples file
/usr/lib/MineSet/ruleviz/examples/category.ruleviz
uses it.
4.2.7 _C_h_a_n_g_e_s__a_n_d__A_d_d_i_t_i_o_n_s__t_o__t_h_e__D_a_t_a__M_o_v_e_r
+o The Data Mover no longer uses Oracle-provided library,
libclnsh.so, to connect to Oracle databases. Because
of this, there is no longer a need for a local Oracle
installation when MineSet is to access a remote Oracle
database.
+o The Data Mover now reads and writes files in the
MineSet binary file format in addition to the ASCII
format.
+o Filtering, i.e., allowing only records satisfying a
specified condition to pass, is now supported as
streaming operation.
+o Random sampling of records is now supported as a
streaming operation. This comes in two forms, one in
which the user specifies a desired resulting sample
size, and one in which the user specifies an
approximate percentage of records to include in the
sample (accept records with probability p).
+o Data Mover has a now accumulates basic statistical
information on a data source. The resulting data is
used to support the Statistics Visualizer.
4.2.8 _C_h_a_n_g_e_s__a_n_d__A_d_d_i_t_i_o_n_s__t_o__t_h_e__A_n_a_l_y_t_i_c_a_l__M_i_n_i_n_g__T_o_o_l_s
- 6 -
+o An Option Tree Inducer and Classifier have been added
to the set of inducers available under the Mining Tools
Classify tab.
+o The classifiers and inducers have been extended to work
with record weights.
+o The classifiers and inducers can now utilize a user
specified loss matrix that indicates the loss (or cost)
associated with various types of classification errors.
+o Generating a learning curve has been added as a new
classifier mode. A learning curve assesses how the
classifier's error rate is affected by the number of
training records.
+o Accuracy estimation has been changed to error
estimation. The _E_s_t_i_m_a_t_e _e_r_r_o_r _m_o_d_e now generates a
model from the whole dataset in addition to estimating
the error using cross validation.
+o Decision Trees and Option Trees now show the estimated
error for every node, allowing users to better
understand where the model is more accurate and where
it is not. This estimate is now mapped to color,
replacing the purity mapping used in MineSet 1.X.
+o The inducers now generate classifiers that are capable
of estimating probabilities (scoring), not just
classifying records. This option is available through
the apply-classifer transformation.
+o Lift curves, showing the effectiveness of the
probability estimates, can be generated from _F_u_r_t_h_e_r
_i_n_d_u_c_e_r _o_p_t_i_o_n_s and under _A_p_p_l_y _C_l_a_s_s_i_f_i_e_r'_s _t_e_s_t
_c_l_a_s_s_i_f_i_e_r. Lift curves show how effectively a
classifier can distinguish a specified label value from
all other label values.
+o Confusion matrices, showing the specific types of
errors that the classifier makes, can be generated from
_F_u_r_t_h_e_r _i_n_d_u_c_e_r _o_p_t_i_o_n_s and under _A_p_p_l_y _C_l_a_s_s_i_f_i_e_r'_s
_t_e_s_t _c_l_a_s_s_i_f_i_e_r.
+o It is now possible to backfit the test data into the
classifier after estimating the classifier's accuracy.
This mode is on by default and can be modified in
_F_u_r_t_h_e_r _i_n_d_u_c_e_r _o_p_t_i_o_n_s. It allows users to see the
actual record counts/weights, rather than those that
only appeared in the training set. Fitting the test
data into a classifier updates the probability
- 7 -
estimates without altering the structure of the
classifier. Backfitting can reduce the error rate.
+o The apply classifier options have been extended to
allow testing a classifier against a test set and
fitting new data to previously created classifiers.
Fitting new data can be useful if large amounts of data
are available: a model can be built using a sample and
the bigger dataset can be used to update the model
counts and probability estimates.
+o The Laplace correction for the Evidence Inducer now
supports an automatic correction that has been
empirically determined to be more accurate in many
real-world datasets.
+o The _A_u_t_o_m_a_t_i_c _c_o_l_u_m_n _s_e_l_e_c_t_i_o_n in the Evidence Inducer
now supports a faster "forward" mode.
+o Uniform Weight has been added to the set of automatic
binning approaches. Under uniform weight binning
thresholds are identified that partition the records
into subsets of equal weight.
+o It is now possible to trim a specified percent of the
most extreme values prior to generating uniform range
or uniform weight bins.
+o The binning panel now supports using the training set
only, weighted records, and automatic determination of
weight per bin.
+o Automatic binning time (entropy-based) has been reduced
by a factor of about 15-20. This dramatically reduces
the running time for the Evidence Inducer or when the
automatic binning is used in the binning panel.
+o Reading time (initial loading of data passed by
datamove) has been reduced by about 20-25%.
+o Classification models now require only the actual
attributes that are used in order to apply them to new
data. Specifically, if a decision tree uses only three
attributes, only those will be required to apply it.
4.3 _C_h_a_n_g_e_s__a_n_d__A_d_d_i_t_i_o_n_s__i_n__M_i_n_e_S_e_t__1_._2
+o New capabilities allow MineSet visualizations to be
displayed over the web. Mtr files allow MineSet
visualizations to be shipped over the web to other
machines running MineSet, while rview allow MineSet
- 8 -
applications to be run on the server and displayed over
the network to other machines running X11 and OpenGL,
regardless of whether they have MineSet installed. For
more information, see the directory
/usr/lib/MineSet/www.
+o Those tools that support the execute command have a
-warnexecute option which will issue a warning the
first time that they try to execute a user specified
command. This can be turned on permanently by adding
the line
*minesetWarnExecute:TRUE
to your .Xdefaults file.